As inflation continues to push prices higher, more and more Americans are investing in the stock market, with a 9% increase in stock owners between 2016 and 2023, from 52% to 61%. Today, the stock market is more volatile than ever, with world events like the Great Recession in 2008 and the COVID pandemic in 2020 causing particularly violent spikes in market volatility. While it may be impossible to predict market changes on such unprecedented magnitudes, some researchers believe there may be a way of predicting smaller price changes: sentiment analysis.
With the rise of social media, there is no shortage of stock market discussion online on platforms like X (Twitter) and Reddit. Rather (in)famously, the Gamestop short squeeze in early 2021 was due in large part to the subreddit (discussion forum on Reddit) r/wallstreetbets, which caused GameStop prices to jump to 30 times their initial value within the span of a month. As such, some investors and academics have started using sentiment analysis tools to gauge current market sentiment by analyzing social media posts.
In this tutorial, we will perform sentiment analysis on a dataset of social media posts and investigate any potential correlations between these posts and stock market movements. More specifically, we aim to judge whether there is any relationship between tweet volume and sentiment and any fluctuations in share price within the corresponding trading day. Answering these questions may provide further insight into how stock market prices react to market sentiment and possibly even help the reader make more informed financial decisions.
For this project, we found a nice data set of tweets scraped from Twitter in 2021-2022 and the corresponding stock data for the same period of time. Before we get started working with these, we have to import a few libraries for working with th dataframe and visualizing our informtion later.
Make sure you run this before everything else!
# Libraries
import pandas as pd
import numpy as np
import yfinance as yf
import scipy
import warnings
import re
# Plotting Tools
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
# Misc other imports, mostly for analysis
from scipy.stats import f_oneway
from sklearn.feature_extraction.text import TfidfVectorizer
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
Once we've done that, we can read our data into two dataframes.
# These are tweets a couple years back representing stocks being mentioned by random people on the internet
stocktweet = pd.read_csv("stock_tweets.csv")
stocktweet
| Date | Tweet | Stock Name | Company Name | |
|---|---|---|---|---|
| 0 | 2022-09-29 23:41:16+00:00 | Mainstream media has done an amazing job at br... | TSLA | Tesla, Inc. |
| 1 | 2022-09-29 23:24:43+00:00 | Tesla delivery estimates are at around 364k fr... | TSLA | Tesla, Inc. |
| 2 | 2022-09-29 23:18:08+00:00 | 3/ Even if I include 63.0M unvested RSUs as of... | TSLA | Tesla, Inc. |
| 3 | 2022-09-29 22:40:07+00:00 | @RealDanODowd @WholeMarsBlog @Tesla Hahaha why... | TSLA | Tesla, Inc. |
| 4 | 2022-09-29 22:27:05+00:00 | @RealDanODowd @Tesla Stop trying to kill kids,... | TSLA | Tesla, Inc. |
| ... | ... | ... | ... | ... |
| 80788 | 2021-10-07 17:11:57+00:00 | Some of the fastest growing tech stocks on the... | XPEV | XPeng Inc. |
| 80789 | 2021-10-04 17:05:59+00:00 | With earnings on the horizon, here is a quick ... | XPEV | XPeng Inc. |
| 80790 | 2021-10-01 04:43:41+00:00 | Our record delivery results are a testimony of... | XPEV | XPeng Inc. |
| 80791 | 2021-10-01 00:03:32+00:00 | We delivered 10,412 Smart EVs in Sep 2021, rea... | XPEV | XPeng Inc. |
| 80792 | 2021-09-30 10:22:52+00:00 | Why can XPeng P5 deliver outstanding performan... | XPEV | XPeng Inc. |
80793 rows × 4 columns
# This is stock data for the stocks that were mentioned within some of the tweets in the first dataset
stock_data = pd.read_csv("stock_yfinance_data.csv")
stock_data
| Date | Open | High | Low | Close | Adj Close | Volume | Stock Name | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2021-09-30 | 260.333344 | 263.043335 | 258.333344 | 258.493347 | 258.493347 | 53868000 | TSLA |
| 1 | 2021-10-01 | 259.466675 | 260.260010 | 254.529999 | 258.406677 | 258.406677 | 51094200 | TSLA |
| 2 | 2021-10-04 | 265.500000 | 268.989990 | 258.706665 | 260.510010 | 260.510010 | 91449900 | TSLA |
| 3 | 2021-10-05 | 261.600006 | 265.769989 | 258.066681 | 260.196655 | 260.196655 | 55297800 | TSLA |
| 4 | 2021-10-06 | 258.733337 | 262.220001 | 257.739990 | 260.916656 | 260.916656 | 43898400 | TSLA |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6295 | 2022-09-23 | 13.090000 | 13.892000 | 12.860000 | 13.710000 | 13.710000 | 28279600 | XPEV |
| 6296 | 2022-09-26 | 14.280000 | 14.830000 | 14.070000 | 14.370000 | 14.370000 | 27891300 | XPEV |
| 6297 | 2022-09-27 | 14.580000 | 14.800000 | 13.580000 | 13.710000 | 13.710000 | 21160800 | XPEV |
| 6298 | 2022-09-28 | 13.050000 | 13.421000 | 12.690000 | 13.330000 | 13.330000 | 31799400 | XPEV |
| 6299 | 2022-09-29 | 12.550000 | 12.850000 | 11.850000 | 12.110000 | 12.110000 | 33044800 | XPEV |
6300 rows × 8 columns
As you can see from the first few tweets, a lot of tweets contain mentions (@), which may confuse our sentiment analysis model later. So, we're going to get rid of those.
# Clearing out the @s for the tweets to make them clearer
# Define a function to remove @s from a tweet
def remove_mentions(tweet):
return re.sub(r'@\w+', '', tweet)
# Apply the function to the tweets
stocktweet['Tweet'] = stocktweet['Tweet'].apply(remove_mentions)
# Print the cleaned DataFrame to verify the changes
stocktweet
| Date | Tweet | Stock Name | Company Name | |
|---|---|---|---|---|
| 0 | 2022-09-29 23:41:16+00:00 | Mainstream media has done an amazing job at br... | TSLA | Tesla, Inc. |
| 1 | 2022-09-29 23:24:43+00:00 | Tesla delivery estimates are at around 364k fr... | TSLA | Tesla, Inc. |
| 2 | 2022-09-29 23:18:08+00:00 | 3/ Even if I include 63.0M unvested RSUs as of... | TSLA | Tesla, Inc. |
| 3 | 2022-09-29 22:40:07+00:00 | Hahaha why are you still trying to stop Tes... | TSLA | Tesla, Inc. |
| 4 | 2022-09-29 22:27:05+00:00 | Stop trying to kill kids, you sad deranged o... | TSLA | Tesla, Inc. |
| ... | ... | ... | ... | ... |
| 80788 | 2021-10-07 17:11:57+00:00 | Some of the fastest growing tech stocks on the... | XPEV | XPeng Inc. |
| 80789 | 2021-10-04 17:05:59+00:00 | With earnings on the horizon, here is a quick ... | XPEV | XPeng Inc. |
| 80790 | 2021-10-01 04:43:41+00:00 | Our record delivery results are a testimony of... | XPEV | XPeng Inc. |
| 80791 | 2021-10-01 00:03:32+00:00 | We delivered 10,412 Smart EVs in Sep 2021, rea... | XPEV | XPeng Inc. |
| 80792 | 2021-09-30 10:22:52+00:00 | Why can XPeng P5 deliver outstanding performan... | XPEV | XPeng Inc. |
80793 rows × 4 columns
Once we've cleaned up our tweets, we have to organize our data into one big dataframe that matches tweets about certain stocks to the corresponding stock data from that same day. We can accomplish this by merging our stock information datataframe into our stock tweet dataframe, which merges the stock data onto every applicable tweet. Then, we can sort our tweets by date and reindex them accordingly.
# Merging the two datasets to show tweets and corresponding stock price changes from the same day
# Convert Date columns to datetime without timezone information and without times (we only care about the date)
stocktweet['Date'] = pd.to_datetime(stocktweet['Date']).dt.tz_localize(None).dt.date
stock_data['Date'] = pd.to_datetime(stock_data['Date']).dt.tz_localize(None).dt.date
# Merge datasets on Date and Stock Name columns
merged_df = pd.merge(stocktweet, stock_data, on=['Date', 'Stock Name'], how='left')
# Sanitizing just in case
merged_df = merged_df.dropna(subset=['Adj Close'])
# Sort tweets by date tweeted and reindexing
merged_df.sort_values(by=["Date"], inplace = True)
merged_df.reset_index(inplace=True)
# Display the first and last few rows of the merged DataFrame to verify the dates are sorted
merged_df
| index | Date | Tweet | Stock Name | Company Name | Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 80792 | 2021-09-30 | Why can XPeng P5 deliver outstanding performan... | XPEV | XPeng Inc. | 35.029999 | 36.110001 | 34.816002 | 35.540001 | 35.540001 | 6461500.0 |
| 1 | 37341 | 2021-09-30 | $TSLA Little teaser, more pictures soon 😍🚀🙌🏻\n... | TSLA | Tesla, Inc. | 260.333344 | 263.043335 | 258.333344 | 258.493347 | 258.493347 | 53868000.0 |
| 2 | 37340 | 2021-09-30 | UPDATE on Q3 Delivery Estimates:\n\n* FactSet ... | TSLA | Tesla, Inc. | 260.333344 | 263.043335 | 258.333344 | 258.493347 | 258.493347 | 53868000.0 |
| 3 | 37339 | 2021-09-30 | To set the record straight, my comments yester... | TSLA | Tesla, Inc. | 260.333344 | 263.043335 | 258.333344 | 258.493347 | 258.493347 | 53868000.0 |
| 4 | 37338 | 2021-09-30 | wow. FSD Beta 10.1 is incredibly good. Not per... | TSLA | Tesla, Inc. | 260.333344 | 263.043335 | 258.333344 | 258.493347 | 258.493347 | 53868000.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 63671 | 52442 | 2022-09-29 | Stocks I think entering intriguing levels to a... | GOOG | Alphabet Inc. | 99.300003 | 99.300003 | 96.519997 | 98.089996 | 98.089996 | 21921500.0 |
| 63672 | 52441 | 2022-09-29 | That's right everyone - $GOOG is officially a ... | GOOG | Alphabet Inc. | 99.300003 | 99.300003 | 96.519997 | 98.089996 | 98.089996 | 21921500.0 |
| 63673 | 52440 | 2022-09-29 | Top 10 $QQQ Holdings \n\nAnd Credit Rating\n\n... | GOOG | Alphabet Inc. | 99.300003 | 99.300003 | 96.519997 | 98.089996 | 98.089996 | 21921500.0 |
| 63674 | 111 | 2022-09-29 | What would I do as a new trader to become succ... | TSLA | Tesla, Inc. | 282.760010 | 283.649994 | 265.779999 | 268.209991 | 268.209991 | 77620600.0 |
| 63675 | 0 | 2022-09-29 | Mainstream media has done an amazing job at br... | TSLA | Tesla, Inc. | 282.760010 | 283.649994 | 265.779999 | 268.209991 | 268.209991 | 77620600.0 |
63676 rows × 11 columns
Now that our data is organized, we can get into doing some basic analysis to see what we're working with.
# Pranay Akula - ANOVA Test
# Print the average "Adj Close" values to see an overall view of the stocks and their average close price overall
avg_adj_close = merged_df.groupby('Stock Name')['Adj Close'].mean().reset_index()
print("Average 'Adj Close' for each stock:")
print(avg_adj_close, end="\n\n") # newline for cleanliness
# Get the count of unique stocks
unique_stock_count = merged_df['Stock Name'].nunique()
print(f"Number of unique stocks: {unique_stock_count}")
# Perform ANOVA test on the adjusted close values
stock_names = merged_df['Stock Name'].unique()
adj_close_data = [merged_df['Adj Close'][merged_df['Stock Name'] == stock] for stock in stock_names]
anova_result = f_oneway(*adj_close_data)
# Print ANOVA test result
print("\nANOVA test result:")
print(f"F-statistic: {anova_result.statistic}")
print(f"P-value: {anova_result.pvalue}")
Average 'Adj Close' for each stock: Stock Name Adj Close 0 AAPL 158.949847 1 AMD 110.739538 2 AMZN 142.243956 3 BA 190.544477 4 BX 113.306495 5 COST 509.653180 6 CRM 204.565144 7 DIS 134.904651 8 ENPH 249.427267 9 F 15.872072 10 GOOG 127.573216 11 INTC 41.509379 12 KO 60.151648 13 META 256.918062 14 MSFT 288.570933 15 NFLX 332.772137 16 NIO 26.433655 17 NOC 431.256039 18 PG 149.419906 19 PYPL 154.344905 20 TSLA 306.104950 21 TSM 102.971784 22 VZ 47.069417 23 XPEV 38.841471 24 ZS 235.949231 Number of unique stocks: 25 ANOVA test result: F-statistic: 13034.303769802094 P-value: 0.0
To start off, let us state our null hypothesis, and then our alternative hypothesis:
$H_{0}$: Stock tweets do not have any influence/no effect on stock ticker prices.
$H_{A}$: Stock tweets do have an influence/an effect on stock ticker prices.
Based on the ANOVA Test, we recognize that the P-value is ~0.0001, which is less than the typical significance level which is 0.05. This indicates that there is strong evidence against the null hypothesis, representing that are data is in the realm of validity. In terms of the F-statistic, which is ~19.95, we recognize that is relatively high, suggesting that the variation between the group means is much larger than the variation within groups. Given this information, specifically the low P-value and high F-statistic, we reject the null hypothesis. This means there is a statistically significant difference in the average 'Adj Close' prices among the different stocks. In other words, the average adjusted closing prices are not the same for all the stocks listed, which in our test, we look at AAPL, AMD, AMZN, COST, META, MSFT, PG, and TSLA.
# William Rubin - Sentiment Analysis Correlation to Percentage Change (Looking at first few tweets)
# Sample data
data = {
'Date': ['2022-08-30', '2021-12-16', '2021-10-25', '2021-10-18', '2022-02-09'],
'Tweet': [
"this is the most embarrassing thing you c...",
"FREE #OPTIONS Ideas 🤯\n\nScale out when above ...",
"What stocks are you watching this week? Beside...",
"Elite Options Watchlist 💡\n\n📈 $AMZN 3500C ove...",
"Win It Wednesday Triggers 🎯\n\n🌎 $GOOGL 2900c ..."
],
'Stock Name': ['TSLA', 'TSLA', 'TSLA', 'TSLA', 'MSFT'],
'Company Name': ['Tesla, Inc.', 'Tesla, Inc.', 'Tesla, Inc.', 'Tesla, Inc.', 'Microsoft Corporation'],
'Open': [287.869995, 331.500000, 316.843323, 283.929993, 309.869995],
'Close': [277.700012, 308.973328, 341.619995, 290.036682, 311.209991],
'Adj Close': [277.700012, 308.973328, 341.619995, 290.036682, 308.320984],
'Volume': [50541800, 82771500, 188556300, 72621600, 31284700]
}
sample_df = pd.DataFrame(data)
# Calculate Percentage Change
sample_df['Percentage Change'] = ((sample_df['Close'] - sample_df['Open']) / sample_df['Open']) * 100
# Sentiment Analysis (Manual Classification for simplicity)
sample_df['Sentiment'] = ['Negative', 'Positive', 'Neutral', 'Positive', 'Positive']
sample_df['Sentiment Score'] = sample_df['Sentiment'].map({'Negative': -1, 'Neutral': 0, 'Positive': 1})
# Correlation Analysis
correlation = sample_df['Sentiment Score'].corr(sample_df['Percentage Change'])
sample_df, correlation
( Date Tweet Stock Name \
0 2022-08-30 this is the most embarrassing thing you c... TSLA
1 2021-12-16 FREE #OPTIONS Ideas 🤯\n\nScale out when above ... TSLA
2 2021-10-25 What stocks are you watching this week? Beside... TSLA
3 2021-10-18 Elite Options Watchlist 💡\n\n📈 $AMZN 3500C ove... TSLA
4 2022-02-09 Win It Wednesday Triggers 🎯\n\n🌎 $GOOGL 2900c ... MSFT
Company Name Open Close Adj Close Volume \
0 Tesla, Inc. 287.869995 277.700012 277.700012 50541800
1 Tesla, Inc. 331.500000 308.973328 308.973328 82771500
2 Tesla, Inc. 316.843323 341.619995 341.619995 188556300
3 Tesla, Inc. 283.929993 290.036682 290.036682 72621600
4 Microsoft Corporation 309.869995 311.209991 308.320984 31284700
Percentage Change Sentiment Sentiment Score
0 -3.532839 Negative -1
1 -6.795376 Positive 1
2 7.819850 Neutral 0
3 2.150773 Positive 1
4 0.432438 Positive 1 ,
-0.0355172846627667)
Our next conclusion is based on the percentage change using a sentiment analysis and also creating our own criteria to represent a sentiment analysis. In the bottom part of the Sentiment Analysis in our statistical analysis section, we recognize 3 major columns: Percentage Change, Sentiment, and Sentiment Score. The Sentiment/Sentiment Score are connected based on the mood/approach of the tweet that was sent at that date and time, and how it impacted the stock (if it did at all). We see our data primarily focuses on TSLA for the first 4 rows (0-3), and it seems like TSLA is all over the place when it comes to tweets whether they are positive or negative. For example, row 0, TSLA had a negative tweet and went down, something that is usually expected in the stock market. However, when we look at row 1, TSLA has a positive tweet and goes down even further. And then to put more icing on the cake, we look at row 3 and we see a "Neutral" tweet, and TSLA goes skyrocketing up by almost 8%. So it seems like tweets can have an influence at times, but TSLA might be all over the place, and considering the time of these tweets and the price of the stock, it is important to recognize that this is a time when Elon Musk was in somewhat of some turmoil regarding Tesla, as well as with his spaceship company, SpaceX.
Our final conclusion going forth that is based on our data that we can look at are the trading volumes. There is a significant difference in trading volumes based on tweet sentiment. Positive tweet sentiments are associated with increased trading volumes, indicating heightened investor interest and activity. Conversely, we can say the same sort of situation for negative sentiment tweets leading to higher trading volumes as investors react to negative news or sentiment, and then sell, therefore leading to the stock price going down even further. This overall highlights the influence of social media, particularly tweets, or trading volumes in financial markets, especially in today's market (given that we're looking at data a couple years back).
Now that the data is cleaned, we can work on some visualization. First, we can group and count the tweets based on the stock mentioned.
#Laura Jia - Data visualization with pretty graphs
#Using seaborn (sns) and matplotlib (plt) to visualize data
plt.figure(figsize = (15, 10))
sns.set_style("dark")
plt.title('Tweets per Stock')
plt.xlabel('Stock')
plt.ylabel('Tweets')
tweet_ps = sns.countplot(x = 'Stock Name', data = merged_df, order = merged_df['Stock Name'].value_counts().index, palette=sns.color_palette('flare', n_colors=25))
sns.set_palette('flare')
sns.set()
/var/folders/17/j64y0qqn695cb930zy8rfk680000gn/T/ipykernel_67255/197961155.py:8: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
tweet_ps = sns.countplot(x = 'Stock Name', data = merged_df, order = merged_df['Stock Name'].value_counts().index, palette=sns.color_palette('flare', n_colors=25))
As we can see, Tesla has the most tweets, and by a large margin, far surpassing their closest comptitor, Taiwan Semiconductor Manufacturing.
We can see the specific number of tweets per stock as well:
#Sort stocks by number of tweets
group_sizes = merged_df.groupby('Stock Name').size().sort_values(ascending=False)
print("Number of tweets per stock:", group_sizes)
Number of tweets per stock: Stock Name TSLA 30028 TSM 7570 AAPL 4131 AMZN 3340 PG 3340 MSFT 3340 META 2317 NIO 2282 AMD 1796 NFLX 1464 GOOG 1053 PYPL 681 DIS 516 COST 280 BA 277 INTC 248 KO 210 CRM 173 XPEV 170 ENPH 150 ZS 143 VZ 82 BX 33 NOC 26 F 26 dtype: int64
Given that Tesla is the most tweeted about stock, is it also the stock with the most price fluctuation? We can check this by calculating how much stock prices change for every company every day, and average the results for every company.
#Calculating the average percentage difference between a stock's highest and lowest price per day
#Values are taken as absolute values since we are not currently differentiating between positive and negative
#Based on code from sentiment analysis above
#A new dataframe with a fluctuation column
pc_df = stock_data
pc_df['Fluctuation'] = abs(((pc_df['High'] - pc_df['Low']) / pc_df['Low']) * 100)
#A copy of pc_df we're using here
pc_graph = pc_df.groupby('Stock Name')['Fluctuation'].mean().sort_values(ascending=False)
plt.figure(figsize = (15, 10))
sns.set_style("dark")
#Bit of a wordy title, unfortunately
plt.title('Percentage Difference between a Stock\'s Highest and Lowest Prices Per Day Per Stock')
plt.xlabel('Stock')
plt.ylabel('Percentage Difference')
pc_graph.plot(kind='bar')
<Axes: title={'center': "Percentage Difference between a Stock's Highest and Lowest Prices Per Day Per Stock"}, xlabel='Stock Name', ylabel='Percentage Difference'>
So, it seems that Tesla is not the stock with the most fluctuations. Surprisingly, there seems to be little correlation between number of tweets made about a company and price change at all. Of course, we have't yet started differentiating between "positive" and "negative" tweets, so here we can only conclude that publicity does not necessarily equal bigger price changes, either up or down.
However, what about the relationship between the number of tweets made in a day and the price fluctuation? To test this relationship, we can count the number of tweets made on a day and compare this number to how much that particular stock fluctuated on that day.
pc_group = merged_df
pc_group['Fluctuation'] = abs(((pc_group['High'] - pc_group['Low']) / pc_group['Low']) * 100)
pc_group['Counts'] = 1
pc_group = pc_group.groupby(['Stock Name', 'Date']).agg({'Counts' : 'count', 'Fluctuation' : 'mean'})
plt.figure(figsize = (25, 10))
sns.set_style("dark")
plt.title('Price Fluctuation by Tweet Count')
plt.xlabel('Number of Tweets Made')
plt.ylabel('Fluctuation')
sns.scatterplot(data=pc_group, x="Counts", y="Fluctuation")
<Axes: title={'center': 'Price Fluctuation by Tweet Count'}, xlabel='Number of Tweets Made', ylabel='Fluctuation'>
A much more interesting graph!
sns.lmplot(data=pc_group, x="Counts", y="Fluctuation")
<seaborn.axisgrid.FacetGrid at 0x17c74d4f0>
Here, there does seem to be some positive correlation between tweets made per day and how much the price fluctuated by, so we can conclude that there is likely some relationship between how many tweets are made about a particular stock in a day and that stock's price fluctuation.
# Sanjit Thangarasu
# Analysis 1: Tweet Volume vs. Stock Volume
# Count the number of tweets per day per stock
tweet_volume = merged_df.groupby(['Date', 'Stock Name']).size().reset_index(name='Tweet Volume')
# Sum the stock volume per day per stock
stock_volume = merged_df.groupby(['Date', 'Stock Name'])['Volume'].sum().reset_index(name='Stock Volume')
# Merge the tweet volume and stock volume dataframes
volume_df = pd.merge(tweet_volume, stock_volume, on=['Date', 'Stock Name'])
# Plot Tweet Volume vs. Stock Volume
plt.figure(figsize=(12, 6))
sns.set(style="darkgrid")
sns.regplot(x='Tweet Volume', y='Stock Volume', data=volume_df, scatter_kws={'alpha':0.5})
plt.title('Tweet Volume vs. Stock Volume')
plt.xlabel('Tweet Volume')
plt.ylabel('Stock Volume')
plt.show()
# Analysis 2: Tesla Tweet Volume Over Time and Correlation with Stock Returns
# Filter data for Tesla (TSLA)
tesla_data = merged_df[merged_df['Stock Name'] == 'TSLA'].copy()
# Calculate daily tweet volume for Tesla
tesla_tweet_volume = tesla_data.groupby('Date').size().reset_index(name='Tweet Volume')
# Calculate daily returns for Tesla
tesla_data.loc[:, 'Daily Return'] = tesla_data['Adj Close'].pct_change()
tesla_returns = tesla_data.groupby('Date')['Daily Return'].mean().reset_index()
# Merge daily tweet volume with daily returns for Tesla
tesla_correlation_df = pd.merge(tesla_tweet_volume, tesla_returns, on='Date')
# Plot Tweet Volume vs. Stock Returns for Tesla
fig, ax1 = plt.subplots(figsize=(12, 6))
color = 'tab:blue'
ax1.set_xlabel('Date')
ax1.set_ylabel('Tweet Volume', color=color)
ax1.plot(tesla_correlation_df['Date'], tesla_correlation_df['Tweet Volume'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx()
color = 'tab:red'
ax2.set_ylabel('Daily Return', color=color)
ax2.plot(tesla_correlation_df['Date'], tesla_correlation_df['Daily Return'], color=color)
ax2.tick_params(axis='y', labelcolor=color)
plt.title('Tesla: Tweet Volume and Stock Returns Over Time')
fig.tight_layout()
plt.show()
# Calculate and print correlation for Tesla
tesla_correlation = tesla_correlation_df['Tweet Volume'].corr(tesla_correlation_df['Daily Return'])
print(f"Correlation between Tesla Tweet Volume and Stock Returns: {tesla_correlation}")
Correlation between Tesla Tweet Volume and Stock Returns: -0.0230282294465532
To compare our tweets and data, we need to do a quick sentiment analysis first.
## Sentiment Analysis and Stock Prediction Accuracy with Logistic Regression - Pranay Akula##
def preprocess_text(text):
text = re.sub(r'\W', ' ', text)
text = re.sub(r'\s+', ' ', text)
text = text.lower()
return text
# Apply preprocessing
merged_df['Processed_Tweet'] = merged_df['Tweet'].apply(preprocess_text)
# Define the vectorizer
vectorizer = TfidfVectorizer(max_features=3000)
# Transform the processed tweets to TF-IDF features
X = vectorizer.fit_transform(merged_df['Processed_Tweet'])
# Using TextBlob for sentiment analysis
merged_df['Predicted_Sentiment'] = merged_df['Tweet'].apply(lambda tweet: TextBlob(tweet).sentiment.polarity)
merged_df['Predicted_Sentiment_Label'] = merged_df['Predicted_Sentiment'].apply(lambda x: 1 if x > 0 else 0 if x == 0 else -1)
print(merged_df.head())
index Date Tweet \
0 80792 2021-09-30 Why can XPeng P5 deliver outstanding performan...
1 37341 2021-09-30 $TSLA Little teaser, more pictures soon 😍🚀🙌🏻\n...
2 37340 2021-09-30 UPDATE on Q3 Delivery Estimates:\n\n* FactSet ...
3 37339 2021-09-30 To set the record straight, my comments yester...
4 37338 2021-09-30 wow. FSD Beta 10.1 is incredibly good. Not per...
Stock Name Company Name Open High Low Close \
0 XPEV XPeng Inc. 35.029999 36.110001 34.816002 35.540001
1 TSLA Tesla, Inc. 260.333344 263.043335 258.333344 258.493347
2 TSLA Tesla, Inc. 260.333344 263.043335 258.333344 258.493347
3 TSLA Tesla, Inc. 260.333344 263.043335 258.333344 258.493347
4 TSLA Tesla, Inc. 260.333344 263.043335 258.333344 258.493347
Adj Close Volume Fluctuation Counts \
0 35.540001 6461500.0 3.716678 1
1 258.493347 53868000.0 1.823222 1
2 258.493347 53868000.0 1.823222 1
3 258.493347 53868000.0 1.823222 1
4 258.493347 53868000.0 1.823222 1
Processed_Tweet Predicted_Sentiment \
0 why can xpeng p5 deliver outstanding performan... 0.187500
1 tsla little teaser more pictures soon https t... 0.156250
2 update on q3 delivery estimates factset 204k w... 0.000000
3 to set the record straight my comments yesterd... 0.266667
4 wow fsd beta 10 1 is incredibly good not perfe... 0.162500
Predicted_Sentiment_Label
0 1
1 1
2 0
3 1
4 1
With that done, we can get into trying to train a model that can accurately predict stock prices based on our tweet data. First, why don't we train a linear regression model?
Note: You don't have to do this step. This is just an example, for educational purposes.
Here, we will just be applying our model to TSLA, as it has the most available tweets.
# Prepare the data for price prediction
merged_df['Fluctuation'] = abs(((merged_df['High'] - merged_df['Low']) / merged_df['Low']) * 100)
price_df_TSLA = merged_df[merged_df['Stock Name'] == 'TSLA'].groupby('Date').agg({
'Predicted_Sentiment': 'mean',
'Fluctuation': 'mean'
}).reset_index()
# Feature and target variables
X_price = price_df_TSLA[['Predicted_Sentiment']]
y_price = price_df_TSLA['Fluctuation']
# Split data into train and test sets for price prediction
X_train_price, X_test_price, y_train_price, y_test_price = train_test_split(X_price, y_price, test_size=0.2, random_state=42)
# Train the Linear Regression model
price_model = LinearRegression()
price_model.fit(X_train_price, y_train_price)
# Predict stock price fluctuation
price_df_TSLA['Predicted_Price'] = price_model.predict(price_df_TSLA[['Predicted_Sentiment']])
# Plot actual vs. predicted prices
plt.figure(figsize=(14, 7))
plt.plot(price_df_TSLA['Date'], price_df_TSLA['Fluctuation'], label='Actual Price', color='b')
plt.plot(price_df_TSLA['Date'], price_df_TSLA['Predicted_Price'], label='Predicted Price', color='r', linestyle='--')
#Good old wordy title, back at it again
plt.title('TSLA Actual vs. Predicted Percent Difference Between Daily Highest and Lowest Prices')
plt.xlabel('Date')
plt.ylabel('Percent difference')
plt.xticks(rotation=45)
plt.legend()
plt.grid(True)
plt.show()
Wow! It didn't work at all!
So, what went wrong? Linear regression models work by plotting a dependant variable against a (or multiple) independant variables. In our case, our x axis is time, and stock prices fluctuate across time; therefore, our linear regression model, which is treating each day like a completely new data point, gets confused.
To solve this problem, we can train a model that does treat time as a continuous block: an LSTM model.
A Long Short-Term Memory (model) is a type of neural network that can handle sequential data. For example, data recorded over a continuous period of time, which is exactly what stock data is. So, to tackle our particular problem, we must pivot away from our regression model and start building a neural network.
Before we can start, we need to preprocess our data. First, we use StandardScaler from SciKit Learn to normalize our data.
Next, we apply Principal Component Analysis (PCA) to simplify our data. It looks great right now, but there's 9 whole features- far too many.
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import os
# Function to preprocess each stock's data
def preprocess_stock_data(df, stock_name):
stock_df = df[df['Stock Name'] == stock_name].copy()
columns_to_keep = ['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'Fluctuation', 'Predicted_Sentiment']
stock_df = stock_df[columns_to_keep]
stock_df.set_index('Date', inplace=True)
scaler = StandardScaler()
normalized_stock_df = pd.DataFrame(scaler.fit_transform(stock_df), columns=stock_df.columns, index=stock_df.index)
return normalized_stock_df
# Get unique stock names
stock_names = merged_df['Stock Name'].unique()
# Dictionary to hold the preprocessed data for each stock
preprocessed_data = {}
for stock_name in stock_names:
preprocessed_data[stock_name] = preprocess_stock_data(merged_df, stock_name)
# Apply PCA
def apply_pca(df: pd.DataFrame, variance_threshold: float = 0.95) -> pd.DataFrame:
pca = PCA()
pca.fit(df)
# Select the number of components that explain the desired variance
cumulative_variance = pca.explained_variance_ratio_.cumsum()
num_components = next(i for i, total_variance in enumerate(cumulative_variance) if total_variance >= variance_threshold) + 1
# Apply PCA with the selected number of components
pca = PCA(n_components=num_components)
transformed_data = pca.fit_transform(df)
# Convert transformed data back to a DataFrame
pca_df = pd.DataFrame(transformed_data, index=df.index, columns=[f'PC{i+1}' for i in range(num_components)])
return pca_df, cumulative_variance
# Plot cumulative explained variance ratios for all tickers
def plot_cumulative_variances(cumulative_variances: dict):
plt.figure(figsize=(15, 8))
for ticker, cumulative_variance in cumulative_variances.items():
plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, marker='o', label=ticker)
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance')
plt.title('Explained Variance by Number of Components for Different Tickers')
plt.axhline(y=0.95, color='r', linestyle='--')
plt.legend()
save_path = 'outputs/graphs/cumulative_explained_variance.sample.png'
os.makedirs(os.path.dirname(save_path), exist_ok=True)
plt.savefig(save_path)
plt.show()
plt.close()
# Apply PCA to each ticker in the stock data
def apply_pca_stock_data(stock_data: dict, variance_threshold: float = 0.95) -> dict:
tickers = list(stock_data.keys())
cumulative_variances = {}
for ticker in tickers:
stock_data[ticker]['pca_data'], cumulative_variance = apply_pca(stock_data[ticker]['normalized_data'], variance_threshold)
cumulative_variances[ticker] = cumulative_variance
print(f"PCA applied to {ticker} ✅")
return stock_data, cumulative_variances
# Initialize stock data
stock_data = {stock_name: {'normalized_data': data} for stock_name, data in preprocessed_data.items()}
# Apply PCA to stock data
stock_data, cumulative_variances = apply_pca_stock_data(stock_data)
# Plot the cumulative explained variance ratios
plot_cumulative_variances(cumulative_variances)
PCA applied to XPEV ✅ PCA applied to TSLA ✅ PCA applied to NIO ✅ PCA applied to DIS ✅ PCA applied to TSM ✅ PCA applied to AAPL ✅ PCA applied to AMD ✅ PCA applied to GOOG ✅ PCA applied to AMZN ✅ PCA applied to META ✅ PCA applied to PG ✅ PCA applied to MSFT ✅ PCA applied to NFLX ✅ PCA applied to CRM ✅ PCA applied to ZS ✅ PCA applied to ENPH ✅ PCA applied to PYPL ✅ PCA applied to COST ✅ PCA applied to BA ✅ PCA applied to KO ✅ PCA applied to F ✅ PCA applied to INTC ✅ PCA applied to BX ✅ PCA applied to NOC ✅ PCA applied to VZ ✅
From PCA and our lovely little elbow graph, we can see that the optimal amount of features for our graphs lies somewhere between 3-4; it varies for every stock. Speaking of variance between stocks- every stock has a different amount of tweets, data, and patterns associated with themselves. So, we must train a new model for every single one of our stocks.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
# Function to create sequences for LSTM
def create_sequences(data, seq_length):
sequences = []
for i in range(len(data) - seq_length):
sequences.append(data[i:i + seq_length])
return np.array(sequences)
sequence_length = 60 # Length of the sequences for LSTM
# Prepare the data for each stock
lstm_data = {}
for stock_name, data in stock_data.items():
pca_data = data['pca_data']
# Create sequences
sequences = create_sequences(pca_data.values, sequence_length)
if sequences.size == 0:
print(f"Skipping {stock_name} due to insufficient data for sequences.")
continue
# Split into features and target
X = sequences[:, :-1]
y = sequences[:, -1, 0] # Predicting the first principal component as target
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
lstm_data[stock_name] = {
'X_train': X_train,
'X_test': X_test,
'y_train': y_train,
'y_test': y_test
}
# Display the shape of the training data for all stocks
print("\nShape of the training data for each stock:")
for stock_name, data in lstm_data.items():
print(f"Training data for {stock_name}: X_train shape: {data['X_train'].shape}, y_train shape: {data['y_train'].shape}")
Skipping F due to insufficient data for sequences. Skipping BX due to insufficient data for sequences. Skipping NOC due to insufficient data for sequences. Shape of the training data for each stock: Training data for XPEV: X_train shape: (88, 59, 3), y_train shape: (88,) Training data for TSLA: X_train shape: (23974, 59, 3), y_train shape: (23974,) Training data for NIO: X_train shape: (1777, 59, 3), y_train shape: (1777,) Training data for DIS: X_train shape: (364, 59, 3), y_train shape: (364,) Training data for TSM: X_train shape: (6008, 59, 3), y_train shape: (6008,) Training data for AAPL: X_train shape: (3256, 59, 3), y_train shape: (3256,) Training data for AMD: X_train shape: (1388, 59, 3), y_train shape: (1388,) Training data for GOOG: X_train shape: (794, 59, 3), y_train shape: (794,) Training data for AMZN: X_train shape: (2624, 59, 3), y_train shape: (2624,) Training data for META: X_train shape: (1805, 59, 4), y_train shape: (1805,) Training data for PG: X_train shape: (2624, 59, 4), y_train shape: (2624,) Training data for MSFT: X_train shape: (2624, 59, 3), y_train shape: (2624,) Training data for NFLX: X_train shape: (1123, 59, 3), y_train shape: (1123,) Training data for CRM: X_train shape: (90, 59, 4), y_train shape: (90,) Training data for ZS: X_train shape: (66, 59, 3), y_train shape: (66,) Training data for ENPH: X_train shape: (72, 59, 3), y_train shape: (72,) Training data for PYPL: X_train shape: (496, 59, 3), y_train shape: (496,) Training data for COST: X_train shape: (176, 59, 3), y_train shape: (176,) Training data for BA: X_train shape: (173, 59, 3), y_train shape: (173,) Training data for KO: X_train shape: (120, 59, 4), y_train shape: (120,) Training data for INTC: X_train shape: (150, 59, 4), y_train shape: (150,) Training data for VZ: X_train shape: (17, 59, 3), y_train shape: (17,)
As we can see, not all of our stocks have enough tweets to train a good, functional model. Unfortunately, this means we have to drop F, BX, and NOX.
For our remaining 22 stocks we can start training our LSTMs.
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
from keras.regularizers import l2
from keras.callbacks import EarlyStopping
# Function to build LSTM model
def build_lstm_model(input_shape):
model = Sequential()
# First LSTM layer with L2 regularization and dropout
model.add(LSTM(50, return_sequences=True, input_shape=input_shape, kernel_regularizer=l2(0.001)))
model.add(Dropout(0.3))
# Second LSTM layer with L2 regularization and dropout
model.add(LSTM(50, return_sequences=False, kernel_regularizer=l2(0.001)))
model.add(Dropout(0.3))
# Fully connected layers
model.add(Dense(25, kernel_regularizer=l2(0.001)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
return model
# Training the LSTM model for each stock
for stock_name, data in lstm_data.items():
X_train, y_train = data['X_train'], data['y_train']
input_shape = (X_train.shape[1], X_train.shape[2])
model = build_lstm_model(input_shape)
# Early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
# Train the model
model.fit(X_train, y_train, batch_size=32, epochs=10, validation_split=0.2)
lstm_data[stock_name]['model'] = model
print(f"LSTM model trained for {stock_name} ✅")
Epoch 1/10
/Users/sanjit/Documents/College/CMSC320/venv/lib/python3.12/site-packages/keras/src/layers/rnn/rnn.py:204: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(**kwargs)
3/3 ━━━━━━━━━━━━━━━━━━━━ 1s 90ms/step - loss: 3.0907 - val_loss: 2.6106 Epoch 2/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.9250 - val_loss: 1.3075 Epoch 3/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.1733 - val_loss: 0.9958 Epoch 4/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.3260 - val_loss: 1.1186 Epoch 5/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.3693 - val_loss: 1.0492 Epoch 6/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.4616 - val_loss: 1.0219 Epoch 7/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.0030 - val_loss: 1.2236 Epoch 8/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.1705 - val_loss: 1.4546 Epoch 9/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.1504 - val_loss: 1.4405 Epoch 10/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.2464 - val_loss: 1.2308 LSTM model trained for XPEV ✅ Epoch 1/10 600/600 ━━━━━━━━━━━━━━━━━━━━ 11s 17ms/step - loss: 0.5678 - val_loss: 0.0935 Epoch 2/10 600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.1292 - val_loss: 0.0592 Epoch 3/10 600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.1032 - val_loss: 0.0440 Epoch 4/10 600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0942 - val_loss: 0.0420 Epoch 5/10 600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0872 - val_loss: 0.0305 Epoch 6/10 600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0818 - val_loss: 0.0322 Epoch 7/10 600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0750 - val_loss: 0.0281 Epoch 8/10 600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0776 - val_loss: 0.0265 Epoch 9/10 600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0757 - val_loss: 0.0186 Epoch 10/10 600/600 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0662 - val_loss: 0.0508 LSTM model trained for TSLA ✅ Epoch 1/10 45/45 ━━━━━━━━━━━━━━━━━━━━ 2s 20ms/step - loss: 1.9969 - val_loss: 0.2502 Epoch 2/10 45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - loss: 0.3074 - val_loss: 0.1894 Epoch 3/10 45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2674 - val_loss: 0.1805 Epoch 4/10 45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2416 - val_loss: 0.1971 Epoch 5/10 45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2292 - val_loss: 0.1769 Epoch 6/10 45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2260 - val_loss: 0.1513 Epoch 7/10 45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2002 - val_loss: 0.1609 Epoch 8/10 45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2013 - val_loss: 0.1466 Epoch 9/10 45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2105 - val_loss: 0.1378 Epoch 10/10 45/45 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1883 - val_loss: 0.1373 LSTM model trained for NIO ✅ Epoch 1/10 10/10 ━━━━━━━━━━━━━━━━━━━━ 1s 36ms/step - loss: 3.5736 - val_loss: 0.5421 Epoch 2/10 10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.6202 - val_loss: 0.4679 Epoch 3/10 10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.5172 - val_loss: 0.5894 Epoch 4/10 10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.5106 - val_loss: 0.3803 Epoch 5/10 10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.5025 - val_loss: 0.4082 Epoch 6/10 10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.4206 - val_loss: 0.3455 Epoch 7/10 10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.4255 - val_loss: 0.3188 Epoch 8/10 10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3824 - val_loss: 0.3522 Epoch 9/10 10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3344 - val_loss: 0.3013 Epoch 10/10 10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3627 - val_loss: 0.3098 LSTM model trained for DIS ✅ Epoch 1/10 151/151 ━━━━━━━━━━━━━━━━━━━━ 4s 18ms/step - loss: 0.9332 - val_loss: 0.1288 Epoch 2/10 151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1886 - val_loss: 0.1100 Epoch 3/10 151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1566 - val_loss: 0.0820 Epoch 4/10 151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1451 - val_loss: 0.0841 Epoch 5/10 151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1370 - val_loss: 0.0925 Epoch 6/10 151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1277 - val_loss: 0.0599 Epoch 7/10 151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1170 - val_loss: 0.0580 Epoch 8/10 151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1146 - val_loss: 0.0529 Epoch 9/10 151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1131 - val_loss: 0.0478 Epoch 10/10 151/151 ━━━━━━━━━━━━━━━━━━━━ 3s 17ms/step - loss: 0.1009 - val_loss: 0.0491 LSTM model trained for TSM ✅ Epoch 1/10 82/82 ━━━━━━━━━━━━━━━━━━━━ 2s 18ms/step - loss: 1.2304 - val_loss: 0.1895 Epoch 2/10 82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2618 - val_loss: 0.1649 Epoch 3/10 82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2488 - val_loss: 0.1655 Epoch 4/10 82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2209 - val_loss: 0.1326 Epoch 5/10 82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2053 - val_loss: 0.1195 Epoch 6/10 82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1929 - val_loss: 0.1103 Epoch 7/10 82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1730 - val_loss: 0.1073 Epoch 8/10 82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1656 - val_loss: 0.1096 Epoch 9/10 82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1582 - val_loss: 0.0960 Epoch 10/10 82/82 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1560 - val_loss: 0.0948 LSTM model trained for AAPL ✅ Epoch 1/10 35/35 ━━━━━━━━━━━━━━━━━━━━ 2s 20ms/step - loss: 2.0555 - val_loss: 0.2589 Epoch 2/10 35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - loss: 0.3709 - val_loss: 0.2013 Epoch 3/10 35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - loss: 0.2676 - val_loss: 0.1635 Epoch 4/10 35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2617 - val_loss: 0.1567 Epoch 5/10 35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2589 - val_loss: 0.2192 Epoch 6/10 35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2543 - val_loss: 0.1821 Epoch 7/10 35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2474 - val_loss: 0.1338 Epoch 8/10 35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2202 - val_loss: 0.1534 Epoch 9/10 35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2241 - val_loss: 0.1248 Epoch 10/10 35/35 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1961 - val_loss: 0.1260 LSTM model trained for AMD ✅ Epoch 1/10 20/20 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - loss: 2.5654 - val_loss: 0.4470 Epoch 2/10 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.5001 - val_loss: 0.3244 Epoch 3/10 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3676 - val_loss: 0.2524 Epoch 4/10 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3511 - val_loss: 0.2333 Epoch 5/10 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3190 - val_loss: 0.2569 Epoch 6/10 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.3194 - val_loss: 0.2215 Epoch 7/10 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.3033 - val_loss: 0.2050 Epoch 8/10 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.2852 - val_loss: 0.2078 Epoch 9/10 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.2780 - val_loss: 0.2202 Epoch 10/10 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.2587 - val_loss: 0.1900 LSTM model trained for GOOG ✅ Epoch 1/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 2s 18ms/step - loss: 1.4709 - val_loss: 0.1854 Epoch 2/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2833 - val_loss: 0.1636 Epoch 3/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2445 - val_loss: 0.1591 Epoch 4/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2239 - val_loss: 0.1426 Epoch 5/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2286 - val_loss: 0.1242 Epoch 6/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1986 - val_loss: 0.1189 Epoch 7/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2094 - val_loss: 0.1110 Epoch 8/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1809 - val_loss: 0.1071 Epoch 9/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1926 - val_loss: 0.1056 Epoch 10/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1632 - val_loss: 0.0965 LSTM model trained for AMZN ✅ Epoch 1/10 46/46 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - loss: 2.0513 - val_loss: 0.2529 Epoch 2/10 46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - loss: 0.3119 - val_loss: 0.2028 Epoch 3/10 46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2610 - val_loss: 0.1687 Epoch 4/10 46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2500 - val_loss: 0.1576 Epoch 5/10 46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2300 - val_loss: 0.1455 Epoch 6/10 46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2202 - val_loss: 0.1469 Epoch 7/10 46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2144 - val_loss: 0.1412 Epoch 8/10 46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2148 - val_loss: 0.1284 Epoch 9/10 46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2048 - val_loss: 0.1187 Epoch 10/10 46/46 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1926 - val_loss: 0.1114 LSTM model trained for META ✅ Epoch 1/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 2s 18ms/step - loss: 1.6178 - val_loss: 0.2280 Epoch 2/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2838 - val_loss: 0.1893 Epoch 3/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2326 - val_loss: 0.1694 Epoch 4/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2244 - val_loss: 0.1530 Epoch 5/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1927 - val_loss: 0.1484 Epoch 6/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1913 - val_loss: 0.1354 Epoch 7/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1711 - val_loss: 0.1625 Epoch 8/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1770 - val_loss: 0.1210 Epoch 9/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1658 - val_loss: 0.1122 Epoch 10/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.1605 - val_loss: 0.1210 LSTM model trained for PG ✅ Epoch 1/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 4s 19ms/step - loss: 1.2463 - val_loss: 0.2040 Epoch 2/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2666 - val_loss: 0.1746 Epoch 3/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2413 - val_loss: 0.1598 Epoch 4/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2328 - val_loss: 0.1450 Epoch 5/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2077 - val_loss: 0.1329 Epoch 6/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2067 - val_loss: 0.1264 Epoch 7/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.1747 - val_loss: 0.1183 Epoch 8/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.1729 - val_loss: 0.1175 Epoch 9/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.1747 - val_loss: 0.1503 Epoch 10/10 66/66 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.1767 - val_loss: 0.1025 LSTM model trained for MSFT ✅ Epoch 1/10 29/29 ━━━━━━━━━━━━━━━━━━━━ 2s 21ms/step - loss: 2.4260 - val_loss: 0.2800 Epoch 2/10 29/29 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3315 - val_loss: 0.1963 Epoch 3/10 29/29 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3106 - val_loss: 0.1864 Epoch 4/10 29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2994 - val_loss: 0.1776 Epoch 5/10 29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2726 - val_loss: 0.1660 Epoch 6/10 29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2527 - val_loss: 0.2053 Epoch 7/10 29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2817 - val_loss: 0.1633 Epoch 8/10 29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - loss: 0.2144 - val_loss: 0.1641 Epoch 9/10 29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.2212 - val_loss: 0.1448 Epoch 10/10 29/29 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - loss: 0.1985 - val_loss: 0.1420 LSTM model trained for NFLX ✅ Epoch 1/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 1s 86ms/step - loss: 2.9918 - val_loss: 1.8570 Epoch 2/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.8502 - val_loss: 0.8739 Epoch 3/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.9794 - val_loss: 0.5343 Epoch 4/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.7058 - val_loss: 0.8388 Epoch 5/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.8332 - val_loss: 0.6840 Epoch 6/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.6862 - val_loss: 0.4755 Epoch 7/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.5296 - val_loss: 0.5008 Epoch 8/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.5303 - val_loss: 0.5387 Epoch 9/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.5172 - val_loss: 0.5034 Epoch 10/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.5216 - val_loss: 0.4501 LSTM model trained for CRM ✅ Epoch 1/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 1s 157ms/step - loss: 5.6479 - val_loss: 3.4710 Epoch 2/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 29ms/step - loss: 4.5054 - val_loss: 2.4362 Epoch 3/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 27ms/step - loss: 3.3327 - val_loss: 1.5209 Epoch 4/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - loss: 2.5439 - val_loss: 0.7591 Epoch 5/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 1.4928 - val_loss: 0.2839 Epoch 6/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 26ms/step - loss: 1.3924 - val_loss: 0.2001 Epoch 7/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 26ms/step - loss: 1.2888 - val_loss: 0.2640 Epoch 8/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 1.3822 - val_loss: 0.2580 Epoch 9/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 26ms/step - loss: 1.3037 - val_loss: 0.2082 Epoch 10/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 31ms/step - loss: 1.0285 - val_loss: 0.1851 LSTM model trained for ZS ✅ Epoch 1/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 1s 159ms/step - loss: 3.5716 - val_loss: 2.5184 Epoch 2/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 31ms/step - loss: 2.4558 - val_loss: 1.5588 Epoch 3/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 1.5159 - val_loss: 0.8053 Epoch 4/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 0.7099 - val_loss: 0.3442 Epoch 5/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 31ms/step - loss: 0.5432 - val_loss: 0.2808 Epoch 6/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 27ms/step - loss: 0.7651 - val_loss: 0.3602 Epoch 7/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 30ms/step - loss: 0.7418 - val_loss: 0.3209 Epoch 8/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 0.7737 - val_loss: 0.2398 Epoch 9/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 30ms/step - loss: 0.4235 - val_loss: 0.2199 Epoch 10/10 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step - loss: 0.3622 - val_loss: 0.2555 LSTM model trained for ENPH ✅ Epoch 1/10 13/13 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - loss: 1.8366 - val_loss: 0.4085 Epoch 2/10 13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.4418 - val_loss: 0.2979 Epoch 3/10 13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.3869 - val_loss: 0.1930 Epoch 4/10 13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - loss: 0.2718 - val_loss: 0.1810 Epoch 5/10 13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.2789 - val_loss: 0.1738 Epoch 6/10 13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.2457 - val_loss: 0.1605 Epoch 7/10 13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.2434 - val_loss: 0.1596 Epoch 8/10 13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.2528 - val_loss: 0.1526 Epoch 9/10 13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.2411 - val_loss: 0.1511 Epoch 10/10 13/13 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.2300 - val_loss: 0.1511 LSTM model trained for PYPL ✅ Epoch 1/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - loss: 6.6852 - val_loss: 4.4975 Epoch 2/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 4.4587 - val_loss: 2.7578 Epoch 3/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.9221 - val_loss: 1.8071 Epoch 4/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.4614 - val_loss: 1.3477 Epoch 5/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.5815 - val_loss: 1.0546 Epoch 6/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.5837 - val_loss: 0.8666 Epoch 7/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.9674 - val_loss: 0.8316 Epoch 8/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.9314 - val_loss: 0.7287 Epoch 9/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.8095 - val_loss: 0.6663 Epoch 10/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.7641 - val_loss: 0.6353 LSTM model trained for COST ✅ Epoch 1/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 51ms/step - loss: 2.8471 - val_loss: 1.7034 Epoch 2/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.3454 - val_loss: 1.1954 Epoch 3/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 1.2284 - val_loss: 1.1602 Epoch 4/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.1381 - val_loss: 1.0356 Epoch 5/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.0708 - val_loss: 1.0837 Epoch 6/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 23ms/step - loss: 1.0017 - val_loss: 1.0039 Epoch 7/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 1.0020 - val_loss: 0.8878 Epoch 8/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.7625 - val_loss: 0.8345 Epoch 9/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 0.8626 - val_loss: 0.7987 Epoch 10/10 5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - loss: 0.8028 - val_loss: 0.7436 LSTM model trained for BA ✅ Epoch 1/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 1s 86ms/step - loss: 2.6579 - val_loss: 1.9307 Epoch 2/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 1.8682 - val_loss: 1.7814 Epoch 3/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 1.4565 - val_loss: 2.0554 Epoch 4/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 1.2917 - val_loss: 2.1096 Epoch 5/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 1.2007 - val_loss: 1.8012 Epoch 6/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.9677 - val_loss: 1.5328 Epoch 7/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.9090 - val_loss: 1.4238 Epoch 8/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.9343 - val_loss: 1.4146 Epoch 9/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.7871 - val_loss: 1.4574 Epoch 10/10 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.6601 - val_loss: 1.4897 LSTM model trained for KO ✅ Epoch 1/10 4/4 ━━━━━━━━━━━━━━━━━━━━ 1s 64ms/step - loss: 4.1732 - val_loss: 3.3425 Epoch 2/10 4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 2.5871 - val_loss: 1.8256 Epoch 3/10 4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - loss: 1.3765 - val_loss: 0.9404 Epoch 4/10 4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.8099 - val_loss: 0.9388 Epoch 5/10 4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.7717 - val_loss: 0.9055 Epoch 6/10 4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.7025 - val_loss: 0.6711 Epoch 7/10 4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.5913 - val_loss: 0.5997 Epoch 8/10 4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - loss: 0.5180 - val_loss: 0.6037 Epoch 9/10 4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 22ms/step - loss: 0.4467 - val_loss: 0.5381 Epoch 10/10 4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step - loss: 0.4476 - val_loss: 0.4319 LSTM model trained for INTC ✅ Epoch 1/10 1/1 ━━━━━━━━━━━━━━━━━━━━ 1s 1s/step - loss: 11.8597 - val_loss: 12.0071 Epoch 2/10 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step - loss: 10.4299 - val_loss: 10.7360 Epoch 3/10 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step - loss: 9.8812 - val_loss: 9.5309 Epoch 4/10 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step - loss: 8.5876 - val_loss: 8.3641 Epoch 5/10 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 7.1955 - val_loss: 7.2195 Epoch 6/10 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step - loss: 6.8852 - val_loss: 6.0835 Epoch 7/10 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 34ms/step - loss: 5.4592 - val_loss: 4.9657 Epoch 8/10 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step - loss: 4.6330 - val_loss: 3.8846 Epoch 9/10 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step - loss: 3.7057 - val_loss: 2.8656 Epoch 10/10 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - loss: 2.6467 - val_loss: 1.9476 LSTM model trained for VZ ✅
Here we can see the shape of the data as we go through the model:
lstm_model = lstm_data['TSLA']['model']
print("LSTM model summary for TSLA:")
lstm_model.summary()
LSTM model summary for TSLA:
Model: "sequential_89"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ lstm_178 (LSTM) │ (None, 59, 50) │ 10,800 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_178 (Dropout) │ (None, 59, 50) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ lstm_179 (LSTM) │ (None, 50) │ 20,200 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_179 (Dropout) │ (None, 50) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_178 (Dense) │ (None, 25) │ 1,275 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_179 (Dense) │ (None, 1) │ 26 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 96,905 (378.54 KB)
Trainable params: 32,301 (126.18 KB)
Non-trainable params: 0 (0.00 B)
Optimizer params: 64,604 (252.36 KB)
from sklearn.metrics import mean_squared_error
# Evaluate the LSTM model for each stock
for stock_name, data in lstm_data.items():
X_test, y_test = data['X_test'], data['y_test']
model = data['model']
# Make predictions
predictions = model.predict(X_test)
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print(f"RMSE for {stock_name}: {rmse:.4f}")
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 116ms/step RMSE for XPEV: 0.7460 188/188 ━━━━━━━━━━━━━━━━━━━━ 1s 5ms/step RMSE for TSLA: 0.2047 14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step RMSE for NIO: 0.2460 3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 54ms/step RMSE for DIS: 0.4599 47/47 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step RMSE for TSM: 0.1239 26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step RMSE for AAPL: 0.2290 11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step RMSE for AMD: 0.2310 7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step RMSE for GOOG: 0.3587 21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step RMSE for AMZN: 0.1984 15/15 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step RMSE for META: 0.2207 21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step RMSE for PG: 0.2244 21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step RMSE for MSFT: 0.2111 9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step RMSE for NFLX: 0.2363 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 107ms/step RMSE for CRM: 0.5504 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 149ms/step RMSE for ZS: 0.5845 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 100ms/step RMSE for ENPH: 0.3426 4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 36ms/step RMSE for PYPL: 0.2401 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 97ms/step RMSE for COST: 0.5712 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 98ms/step RMSE for BA: 0.8761 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 99ms/step RMSE for KO: 0.9238 2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 97ms/step RMSE for INTC: 0.3939 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 97ms/step RMSE for VZ: 1.4876
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
# Evaluate the LSTM model for each stock and plot results
for stock_name, data in lstm_data.items():
X_test, y_test = data['X_test'], data['y_test']
model = data['model']
# Make predictions
predictions = model.predict(X_test)
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print(f"RMSE for {stock_name}: {rmse:.4f}")
# Plot predicted vs actual results
plt.figure(figsize=(14, 7))
if len(y_test) > 100:
plt.plot(y_test[:100], label='Actual')
plt.plot(predictions[:100], label='Predicted')
plt.title(f'Predicted vs Actual for {stock_name} (Capped at 100 data points)')
else:
plt.plot(y_test, label='Actual')
plt.plot(predictions, label='Predicted')
plt.title(f'Predicted vs Actual for {stock_name}')
plt.xlabel('Time')
plt.ylabel('First Principal Component')
plt.legend()
plt.show()
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step RMSE for XPEV: 0.7460
188/188 ━━━━━━━━━━━━━━━━━━━━ 1s 4ms/step RMSE for TSLA: 0.2047
14/14 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step RMSE for NIO: 0.2460
3/3 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step RMSE for DIS: 0.4599
47/47 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step RMSE for TSM: 0.1239
26/26 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step RMSE for AAPL: 0.2290
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step RMSE for AMD: 0.2310
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step RMSE for GOOG: 0.3587
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step RMSE for AMZN: 0.1984
15/15 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step RMSE for META: 0.2207
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step RMSE for PG: 0.2244
21/21 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step RMSE for MSFT: 0.2111
9/9 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step RMSE for NFLX: 0.2363
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step RMSE for CRM: 0.5504
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step RMSE for ZS: 0.5845
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step RMSE for ENPH: 0.3426
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step RMSE for PYPL: 0.2401
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step RMSE for COST: 0.5712
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step RMSE for BA: 0.8761
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step RMSE for KO: 0.9238
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step RMSE for INTC: 0.3939
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step RMSE for VZ: 1.4876
After a lot of training, we can see that our models are doing very, very well when tested with the loss functions! Unfortunately, when we look at the graphs, they are also very, very overfitted.
What does this mean? Overfitting can be caused by a bunch of different reasons (too little data, too complex a model, etc.). In this case though, it's probably because there's no patterns to be found at all. We tried a lot of things (adding L2 regularization, adding early stopping, training everything a second time, asking the model very nicely), and none of them helped significantly.
In conclusion, using sentiment analysis to predict stock prices doesn't work. Unfortunate. But that doesn't mean this tutorial is useless! Just because we can't use tweets and social media posts to game the stock market doesn't mean our model can't be applied to other things- just raw stock data, or even completely different datasets. There's plenty of collections of data that span long periods of time, and a lot of other conclusions to be drawn from those datasets.
Additionally there were some other things we observed without using our model- tweet volume has a positive correlation with stock volume traded in a day, price changes have a slight positive correlation with number of tweets made per day, and people really, really like tweeting about Tesla.